Collocation Map for Overcoming Data Sparseness
نویسندگان
چکیده
Statistical language models are useful because they can provide probabilistic information upon uncertain decision making. The most common statistic is n-grams measuring word cooccurrences in texts. The method suffers from data shortage problem, however. In this paper, we suggest Bayesian networks be used in approximating the statistics of insufficient occurrences and of those that do not occur in the sample texts with graceful degradation. Collocation map is a sigmoid belief network that can be constructed from bigrams. We compared the conditional probabilities and mutual information computed from bigrams and Collocation map. The results show that the variance of the values from Collocation map is smaller than that from frequency measure for the infrequent pairs by 48%. The predictive power of Collocation map for arbitrary associations not observed from sample texts is also demonstrated.
منابع مشابه
Dynamic Coupling Map: Acceleration Space Analysis for Underactuated Robots
Humans and animals are capable of overcoming complex terrain challenges with graceful and agile movements. One of the key ingredients for such complex behaviors is motion coordination to exploit passive dynamics. We present a direct collocation trajectory optimization to find optimal control policy and generate optimal trajectory for the swing up motion of a gymnast on high bar. Notwithstanding...
متن کاملSOLVING SINGULAR ODES IN UNBOUNDED DOMAINS WITH SINC-COLLOCATION METHOD
Spectral approximations for ODEs in unbounded domains have only received limited attention. In many applicable problems, singular initial value problems arise. In solving these problems, most of numerical methods have difficulties and often could not pass the singular point successfully. In this paper, we apply the sinc-collocation method for solving singular initial value problems. The ability...
متن کاملTarget Word Selection Using WordNet and Data-Driven Models in Machine Translation
Collocation information plays an important role in target word selection of machine translation. However, a collocation dictionary fulfills only a limited portion of selection operation because of data sparseness. To resolve the sparseness problem, we proposed a new methodology that selects target words after determining an appropriate collocation class by using a inter-word semantic similarity...
متن کاملCollocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor
Collocations, the combination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exemplifying an application to Kanji character decision processes for Japanese word processors. Unlike recent trials of automatic extraction, our collocations were collected manually through many years of intensive investigation of corp...
متن کاملThe Application of Fuzzy Logic to Collocation Extraction
Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...
متن کامل